Using Regular Expressions for Matching

11.11.2022

An Example of What I Could Add Next

I'm planning on making tutorials a feature of this blog. I have quite a few of them under development in various states of completion. You can see one that I put together earlier this year already posted on my GitHub pages.

Mozahler - FloodFill

At the bottom of this page you can see a partial list of other potential projects. If you would like to see one of them published, then drop me an email by using the link located below my avatar. Please share your thoughts.
˙

My Current Project

In case you didn't check out my earlier tutorial on the Flood Fill algorithm, let me show you a portion of the project I'm currently working on. This is an excerpt/demo from a possible multi-part tutorial that uses Wordle as a demonstration app to discuss a number of Swift-related topics. This should give you some idea of my approach and you can decide if you like it or not.

In my Wordle tutorial I use the popular game as the focus of a project defined as a single Swift Package which tangentially covers a number of technologies, best practices, unit testing, Swift Animation, configuring Fastlane and more. Most important is that the end product is a complete app.
˙

... An Excerpt from the Wordle Tutorial

For those who have studied Swift in school, but haven't had a chance to take things to the next level, the is a complete example describing how to create an iOS app, covering many bases - enough to get your app into the store.

In this extract/demo, I touch upon

- Regular Expressions and Pattern Matching
- Unit Tests in Xcode
- Using Swift Package Manager to modularize your work
- Organizing your code by applying the principles of Swift Composable Architecture (TCA)

Upon completing the tutorial you would have created a Swift Package containing all the code for a complete Wordle app (including tests) written using SwiftUI and relying on SwiftComposableArchitecture that runs on a real device (iPhone or iPad) or simulator.

The Color-Coded Tiles

I'm going to discuss some of the rules of the game here, but if you are unfamiliar with the gameplay, you should definitely play a few games yourself before reading further. None of this is very complicated, but you shouldn't be trying to conquer multiple concepts simultaneously.

The object of the game is to guess the solution - a five-letter word selected by the app when you start the game. You take a turn by typing in five-letters (they have to represent real words, and they have to be in the game's main list of vetted words). The NYT has curated a list which removes possibly offensive words (pussy) and has recently revised their rules to not allow simple plurals (3 letters plus es or 4 letters plus s). Geese for goose is still valid.

When you take a turn, or submit a guess by tapping on the submit button, the app cycles through your 5 tile choices, matching them against the solution. It colors the tiles based on whether and where the letter appears in the solution. By applying what you know about the color category, you can mentally narrow down the list of possible solutions. The object of the game is to guess the solution in as few turns as possible. And you're only allowed six chances to do so.

Let's take a minute to understand how a tile turns a particular color before coming up with code that puts it all together.˙

The Yellow Tile

A yellow tile signifies that the letter is represented in the solution, but is not in the correct position.

This, surprisingly, gives you a lot of information. It can be a bit frustrating to apply it, though. We know that if a tile is yellow that this particular letter appears at least once in the final solution. Knowing that it is in the wrong place is also useful, if as I mentioned, a bit frustrating. If there are 3 yellows and 2 greens, then you know you have all the letters you need for the solution, and can avoid wasting a turn by trying out new letters.

The Black Tile

It is incredibly useful to know what letters aren't in the word. You can still submit the letter in any subsequent guesses (to help construct a word using unused letters - remember it has to be a valid word before you're allowed to submit it!) The problem is that there are 26 possible letters and you only get 6 attempts. You don't want to waste too much energy eliminating letters from consideration.

Personally, at the start of the game I like to submit two guesses using the most frequent letters in American English (including most of the vowels). This means I rarely solve a word in under 3 guesses. It also means that I have most of the vowels and possibly a few of the consonants by the third guess. Keep in mind that the game on the NYT is edited/curated which introduces a certain bias. And be aware that the most used first guess is Adieu.
˙

The Green Tile

Green is gold in this context. A green tile is by far the most useful, and it signifies that you have the letter and the position exactly right.

It doesn't mean that you are done with this letter - think of double oo, ee, ll, mm, ss, not to mention non-consecutive letters, like banal or smash. But it gives you a chance to relax a bit as it confirms you are on the right track.

The most useful aspect of the green tile is that you also have positional information. Applying this information gives you the best opportunity to whittle down the size of the main list (pool of candidates). However, this also means processing this information takes a bit more ingenuity, or at least a better data structure than just a simple array of letters.

Using Regular Expressions for Matching

If you're a mathematician, you might prefer to use the advanced capabilities of working with Swift's RegularExpression that have been added with Swift 5.7 (Xcode 14+). For the rest of us (I got into computer science because I prefer letting the machine do the math), the minimal required knowledge will do. I'll go into what that means shortly.

Regular Expressions deserve a unit to themselves. Support for Regular Expressions is very robust in Swift, and for good reason. RegExes provide coded shortcuts to identify sequences of characters by using a mix of special wildcard characters along with the characters you wish to match against.

There are wildcard characters that will match your string against multiple occurrences of a subsequence in a row. You can specify that there should be exactly one match of the sequence in the string being checked. There's even a wildcard to tell you that the next character in the sequence is not a wildcard. (the backslash '')

The Yellow Tiles

Let's take a look at how we might encode a sequence that includes three yellow tiles s, r and e.

Regular expressions contain a special sequence which is called lookahead. You precede the potential match with a question mark and equal sign, and surround the entire subsequence with parens.

It is special in that it will look at the pending match and see if it can find the sequence, but it doesn't eat the sequence (in other words, the sequence remains intact for other possible searches). This is perfect for taking our list of candidates and only including those with yellow matches.

so three yellow letters SRE translate to:

  (?=.*s.*)(?=.*r.*)(?=.*e.*)

If this is enough to get you started, then by all means go off and code a different solution from the one that follows.

This might be fine for some people, but I'm going to want to code something a bit less cryptic.

That's not to say I won't be using regular expressions in my solution. I just won't use such esoteric sequences. I also won't be looking at the entire sequence at once. I will cycle through each of the five positions in the string. This way I only have to match one character at a time. Our working domain of less than 5000 five-letter words doesn't demand the most efficient approach possible, and I prefer to work with code I can understand when I first view it.

The two special characters I will use to accomplish this are [ and ]. In the RegEx world this means any of the characters between the brackets can match. If I don't follow with sequence with any special characters to indicate frequency, this will only match against one character. The previous definition for matching yellow sre is now represented as

[sre]

Let's say I have my yellow tile letters from my current turn and I want to remove all the words in the main list that don't contain these letters. After all, I'm trying to reduce the candidate pool with each turn.

All I need to do is wrap these letters inside square brackets and then filter the list using the regular expression just mentioned. The result is understandable Swift code.

In order to make that happen, I need to write this yellow() method.

    /// returns all items with the any of the specified letters in it
    func yellow(_ yellowChars: String = "[SRE]") -> [String] {
        mainList.filter({ 
            $0.range(of: yellowChars, 
            options: .regularExpression) != nil 
        })
    }

Using the filter method on the Array<String>, I iterate through all the words in the main list and only return those words that contain at least one of these letters (the result of the match is non-nil).

starting here we lose the headlines again (adding this line brought them back)

The Black Tiles

     func black(_ blackChars: String = "[EIOUY]") -> [String] {
        mainList.filter({ $0.range(of: blackChars, options: .regularExpression) == nil })
    }

This is very similar to the yellow method. Instead of including entries that match, entries are included only if there is not a match against any of the characters in the reference string.

The Green Tiles

I will provide two solutions using two different data structures to process the green tiles.

In this first (less object-oriented) approach I will be working with an array of optional strings. (I'll leave it as an exercise for the reader if you prefer working with optional characters)
˙

var solvedLetters: [String?] = [nil, nil, nil, nil, nil]

The advantages of this simplistic approach is that you have position information (all five tiles are represented, albeit without individual labels) which allow for direct iteration through the elements of the solvedLetters array.

Taking the passed in parameter and iterating through its elements, we can quickly reduce the size of the pool of candidates by eliminating all words that don't have green letters in the indicated position. We simply skip over processing an empty/nil element.

    func green(_ greenChars: [String?]) {
        for (index, item) in greenChars.enumerated() {
            if let item {
                mainList = mainList.filter({ $0[index] == item })
            }
        }

Personally I prefer a more object oriented solution, and my final solution makes use of a Tile object which includes its letter's position within the final solution.

[the second solution omitted in this excerpt]

Give Me Your Feedback!

If you followed along and found my explanations useful, let me know. If I glossed over a point you don't quite get, drop me a line. Click on my email link (below my avatar)
˙

Potential Projects

Wordle
A SwiftUI Picker Using SF Symbols
FloodFill Algorithm (what is missing from the published project?)
Combine
The Swift Concurrency Model and Structured Concurrency
SwiftComposableArchitecture (TCA)
Parsing Markdown (Using the parser library from PointFree)
Xcode Unit Tests
SwiftUI Property Wrappers
SwiftUI View Builders
Javascript and iOS
Refactoring
JSON and REST APIs

As I mentioned earlier, I have a number of tutorials in various stages of completion, and I am also coming up with new ideas as I see new APIs and tools become available.
˙

Links

Google Search

Mozahler - FloodFill

Wordle Solutions Changed

Mozahler

The Next Level